Coder Reliability and Misclassification in Comparative Manifesto Project Codings∗
نویسندگان
چکیده
The long time series of estimated party policy positions generated by the Comparative Manifesto Project (CMP) is the only such time series available to the profession and has been extensively used in a wide variety of applications. Recent work (e.g. Benoit, Laver, and Mikhaylov 2007; Klingemann et. al. 2006, chs. 4–5) focuses on non-systematic sources of error in these estimates that arise from the text generation process. Our concern here, by contrast, is with error that arises during the text coding process, since nearly all manifestos are coded only once by a single coder. First, we discuss reliability and misclassification in the context of hand-coded content analysis methods. Second, we report results of a coding experiment that used trained human coders to code sample manifestos provided by the CMP, allowing us to estimate the reliability of both coders and coding categories. Third, we compare our test codings to the published CMP “gold standard” codings of the test documents to assess accuracy, and produce empirical estimates of a misclassification matrix for each coding category. Finally, we demonstrate the effect of coding misclassification on the CMP’s most widely used index, its left-right scale, and draw conclusions for future use and design of the CMP data.
منابع مشابه
Coder Reliability and Misclassification in the Human Coding of Party Manifestos
The Comparative Manifesto Project (CMP) provides the only time series of estimated party policy positions in political science and has been extensively used in a wide variety of applications. Recent work (e.g., Benoit, Laver, and Mikhaylov 2009; Klingemann et al. 2006) focuses on nonsystematic sources of error in these estimates that arise from the text generation process. Our concern here, by ...
متن کاملRedefining the Nation: Center-right Parties and Visible Minorities in Europe
Dear CPW participants, Thank you for reading my prospectus. I must warn you all that it is unfinished, and thus, unpolished. I look forward to any and all comments; however, I seek your collective wisdom on the following topics: 1. My current dependent variable for my statistical model is based on the Comparative Manifesto Project. However, I would like to test whether or not outreach occurred ...
متن کاملComputing Reliability for Coreference Annotation
Co-reference annotation is annotation of language corpora to indicate which expressions have been used to co-specify the same discourse entity. When annotations of the same data are collected from two or more coders, the reliability of the data may need to be quantified. Two obstacles have stood in the way of applying reliability metrics: incommensurate units across annotations, and lack of a c...
متن کاملMethodological Challenges in Estimating Tone: Application to News Coverage of the U.S. Economy
Machine learning methods have made possible the classification of large corpora of text by measures such as topic, tone, and ideology. However, even when using dictionary-based methods that require few inputs by the analyst beyond the text itself, many decisions must be made before a measure of any kind is produced from the text. When coding media the analyst must decide on the universe of medi...
متن کاملEvaluating Text Segmentation
This thesis investigates the evaluation of automatic and manual text segmentation. Text segmentation is the process of placing boundaries within text to create segments according to some task-dependent criterion. An example of text segmentation is topical segmentation, which aims to segment a text according to the subjective definition of what constitutes a topic. A number of automatic segmente...
متن کامل